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Abstract 

The Box-Cox transformation can sometimes yield noticeable improvements in 
model simplicity, variance homogeneity and precision of estimation, such as in mod¬ 
eling and forecasting age-specific fertility. Despite its importance, there have been 
only few studies focusing on the optimal selection of Box-Cox transformation pa¬ 
rameter in demographic forecasting. A simple method is proposed for selecting the 
optimal Box-Cox transformation parameter, along with an algorithm based on an 
in-sample forecast error measure. Illustrated by Australian age-specific fertility, the 
out-of-sample accuracy of a forecasting method can be improved with the selected 
Box-Cox transformation parameter. Furthermore, the log transformation is not 
adequate for modeling and forecasting age-specific fertility. It is recommended to 
embed the selection of Box-Cox transformation parameter into statistical analysis 
of age-specific demographic data, in order to fully capture forecast uncertainties. 
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1 Introduction 


In the demographic literature, forecasting methods for age-specific fertility can be gen¬ 
erally grouped into parametric, semi-parametric and nonparametric models. Parametric 
models used in forecasting include the beta, gamma, double exponential and Hadwiger 
functions (Knudsen et ah, 1993; Thompson et ah, 1989; Congdon, 1990, 1993; Keilman 
and Pham, 2000), while semi-parametric models include the Coale-Trussed and Rela¬ 
tional Gompertz models (Coale and Trussed, 1974; Brass, 1981; Murphy, 1982; Booth, 
1984; Zeng et ah, 2000). The use of these models is variously limited by parameter un- 
interpretability, over-parameterization and the need for vector autoregression; structural 
change also limits their utility, especially where vector autoregression is involved (Booth, 
2006). To address this problem, nonparametric methods use a dimension-reduction tech¬ 
nique, such as principal components analysis, to linearly transform age-specific fertility 
rates to extract a series of time-varying indexes to be forecast (see Bozik and Bed, 1987; 
Bell, 1992; Lee, 1993; Hyndman and Ullah, 2007). 

The Box-Cox transformation can sometimes yield noticeable improvements in model 
simplicity, variance homogeneity and precision of estimation. Despite the rapid develop¬ 
ment in demographic forecasting models, there have been only few studies focusing on the 
optimal selection of the Box-Cox transformation parameter, with an noticeable exception 
of Hyndman and Booth (2008). As noted in early work by Box and Cox (1964) and Box 
(1988), the careful selection of a data transformation is often treated as a prerequisite 
before any serious modeling takes place. 

An example of data transformation is the log transformation for modeling and forecast¬ 
ing age-specific mortality. Such a transformation allows researchers to visualize and model 
patterns associated with the so-called “accident bump” and to exploit near-linearities in 
the log mortality rates for the ages 40 to 80 years. The log transformation is a special 
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case of the Box-Cox transformation, which can be defined as 


f 1 [(/m) A - 1] ifA^o 

z t,i = < 

ln(/ M ) if A = 0 

where f t ,i > 0 denotes the observed age-specific data at age i in year t, whereas Zt,i denotes 
the transformed data, and A is the transformation parameter. For instance, when A = 1, 
the transformation is essentially the identity, and the logarithm when A = 0. In this 
work, we restrict it to lie in the unit interval (see also Hyndman and Booth, 2008). 

We propose a simple and instructive way of selecting the optimal Box-Cox transfor¬ 
mation parameter based on an in-sample forecast error measure, and to demonstrate this 
idea in the context of modeling and forecasting age-specific fertility. The effect of the 
Box-Cox transformation on fertility is mainly manifested by a different shape of age pro¬ 
file. With the optimal transformation parameter, the age profile of the transformed data 
may reveal age patterns that are not obvious in the raw data. 

This paper is organized as follows. In Section 2, we present the Australian age-specific 
fertility from 1921 to 2006. In Section 3, we present the methodology and optimization 
algorithm. Results are collated in Section 4. Section 5 concludes, along with some 
thoughts on how the method developed here might be further extended. 


2 Data and design 

2.1 Data set 

We consider annual Australian age-specific fertility rates from 1921 to 2006. The data 
set has been obtained from the Australian Bureau of Statistics (Cat. No. 3105.0.65.001, 
Tabic 38), and is also available in the rainbow package (Shang and Hyndman, 2013) in 
(R Core Team, 2014). The data consist of annual fertility rates by single-year age of 
mother aged from 15 to 49 years. A graphical data display is given in Figure 1. From 
the rainbow plot in Figure la, we see the phenomenon of fertility postponement in the 
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most recent years. From the contour plot in Figure lb, we see the increases in fertility 
between ages 20 and 30 from 1940 to 1980, this reflects the baby boom period. 


Australia fertility from 1921 to 2006 
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Figure 1: Observed age-specific fertility rates for Australia from 1921 to 2006. In Figure 
(a), the dashed line represents the data in 1921, while the dotted line represents the data 
in 2006. Source: Australian Bureau of Statistics (Cat. No. 3105.0.65.001, Table 38). 

As a demonstration, Figure 2 presents the Box-Cox transformed fertility rates for years 
1921 and 2006. With different values of A, the age profiles change accordingly. The goal 
is to select the optimal A that improves model estimation and prediction accuracy for a 
chosen model. 



1960 

Year 


(a) Rainbow plot of fertility rates 


(b) Filled contour plot of fertility 


Year 1921 Year 2006 




Figure 2: Box-Cox transformed fertility rates with different values of A in years 1921 and 
2006, as two examples. 
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2.2 Study design 


Since the optimal Box-Cox transformation parameter is selected based on an in-sample 
forecast error measure, we divide the data into a training sample, a validation sample and 
a testing sample. Customarily, the testing sample consists of the last 20% of the data, 
which are used to examine the out-of-sample forecast accuracy with the selected Box-Cox 
transformation parameter. The validation sample, which has the same number of data 
as the testing sample, is used to select the optimal Box-Cox transformation parameter. 
As in the case of Australian fertility rates, the training sample is from 1921 to 1972, the 
validation sample is from 1973 to 1989, and the testing sample is from 1990 to 2006. 

There are various ways to measure the forecast accuracy. Following the early work by 
Shang, Booth, and Hyndman (2011), we use mean absolute forecast error (MAFE) for 
measuring point forecast accuracy. This is given by 


IS-?) AQ 




The MAFE is the average of absolute error across ages and years in the forecasting period; 
it measures forecast precision regardless of sign and is not sensitive to large relative errors 
of small rates. Since the back-transformed forecasts are median forecasts on the original 
scale, this makes them suitable for evaluation using MAFE. 

In order to evaluate the interval forecast accuracy, we utilize the interval score of 
Gneiting and Raftery (2007) and Gneiting and Katzfuss (2014). For each year in the 
forecasting period, the one-step-ahead to 17-step-ahead prediction intervals were calcu¬ 
lated at the (1 — a) x 100% nominal coverage probability. We consider the common case 
of the symmetric (1 — a) x 100% prediction interval, with lower and upper bounds that 
are predictive quantiles at a/2 and 1 — a/2, denoted by l and u. As defined by Gneiting 
and Raftery (2007), a scoring rule is given by the associated interval forecast S a (l,u]i). 
This can be expressed as 


2 

S a (l, u]i) = (u — l)~ I— [(l — i)I{i < 1} + (i — u)I{i > u }], 


a 


( 1 ) 
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where /{•} represents the binary indicator function which takes the value of 1 when the 
condition is met, and a denotes the level of significance. In this paper, a = 0.2 since we 
construct 80% prediction interval. The optimal score is achieved when i lies between l 
and u, and the distance between l and u is minimal. The interval score can be interpreted 
as: a forecaster is rewarded for narrow width of a prediction interval, if and only if the 
true observation lies within the prediction interval. The smaller the interval score is, the 
better the method is for producing interval forecasts. 

For different ages and years in the forecasting period, the averaged interval score is 
defined by 




1 

(18 - h) x 35 


18-/i 49 

y ^ y ^ s a , s (i, u, ?), 

s=l i= 15 


h — 1,... ,17. 


3 Methodology 

Many methods have been proposed for modeling age-specific fertility (see Booth, 2006, 
for reviews). To demonstrate our main idea, we model the observed period age-specific 
fertility, using the well-known Lee-Carter model (Lee and Carter, 1992). Instead of 
retaining only the first component, we retain more than one principal component (see 
also Cairns, Blake, and Dowd, 2006). The modified Lee-Carter model can be defined by 

K 

Zt,i = Hi + ^2 Pt,k<f>k,i + £ t,ii 1 <t<n, 1 <i<p, 

k =i 

where n denotes the last year in the training sample and p denotes the last age, p* rep¬ 
resents the mean estimated by ^ Y2t= l , Pn,k} represents the fcth estimated 

principal component scores, {4>k,i, ■ ■ ■ ,4>k,p} represents the kth estimated principal com¬ 
ponent which can be obtained from singular value decomposition applied to the training 
sample, e tji represents the independent and identically distributed Gaussian white noise, 
and K represents the number of retained principal components and the value of K can 
be determined by a ratio-based estimator (see Lam et al., 2011, for details). 
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3.1 Point and interval forecasts 


Conditional on the estimated mean Ji t and the estimated principal components ..., (pK,i ), 
the point forecasts are given by 

K 

Zn+h\n,i h* T E f^n-{-h\n,k4^k,ij 

k= 1 

where d n +h\n..k represents the /r-step-ahead point forecast of the kth principal component 
scores. These forecasts can be obtained from applying a univariate time-series model, such 
as an autoregressive integrated moving average (ARIMA) model. We use the auto . arima 
algorithm of Hyndman and Khandakar (2008) to select the optimal orders of an ARIMA 
model on the basis of an information criterion, such as the corrected Akaike information 
criterion (Hurvich and Tsai, 1989) considered in this paper. 

Similarly, conditional on the estimated mean and estimated principal components, the 
total variance can be approximated by 

K 

Var [ Z n +h |n,?) ~ ^ ^ ^n+hlrijk^Pk,i T ^n+h,ii 

k =1 

where u n +h\n,k denotes the estimated variance of the sample principal component scores; 

denotes the square of the fixed principal components; and v n+ h,i denotes the estimated 
variance of the model residual (see also Shang, Booth, and Hyndman, 2011). The 80% 
prediction interval of the transformed data can be obtained based on the estimated total 
variance and a normality assumption. 

Having obtained the point and interval forecasts for the transformed data, we then 
back-transform these forecasts to the original scale through inverse Box-Cox transforma¬ 
tion. This can be expressed as 

- j {*Zn+h\n,i + l) 1 if A % 0 

exp(E n+ / i | riii ) if A = 0 
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3.2 Application to age-specific fertility 


In Figure 3, we present principal components and their associated scores for the Australian 
fertility data from 1921 to 1989. Based on these data, the forecasts of fertility rates 
from 1990 to 2006 are obtained. Although the optimal K is selected by the ratio-based 
estimator, we display only the first two components for the ease of presentation. In the 
top panel, we show the age profile. In the middle panel, we display the time trend of 
the principal component scores. In particular, the point forecasts of the scores are shown 
in solid line, whereas the dark and light gray regions represent the 80% and 95% point- 
wise prediction intervals, respectively. In the bottom panel, the forecasts of fertility 
are obtained by multiplying the fixed principal components by the forecast principal 
component scores before adding the main effect. 



Figure 3: Principal component decomposition for the Australian fertility data from 1921 
to 1989, from which the forecasts are obtained from 1990 to 2006. 











The first principal component models the fertility rates at young ages, whereas the 
second principal component models the fertility rates at older ages. From the forecast 
first principal component scores, it is clear that the fertility trend at young ages is likely 
to decline. Based on the forecast second principal component scores, it is evident that 
the fertility trend at older ages is likely to increase. 


4 Result 

4.1 Selection of the optimal parameter 

Section 3.1 presents one method for modeling and forecasting age-specific fertility, but 
the main contribution is to present a method to select optimal transformation parameter 
based on in-sample forecast accuracy. To investigate the in-sample forecast accuracy, we 
implement the rolling origin approach. Using the initial training sample in the Australian 
age-specific fertility, we produce one- to 17-step-ahead point and interval forecasts. Then, 
we increase the sample size by one year, re-estimate the model and produce one- to 
16-step-ahead forecasts. This process is iterated until the training sample reaches the 
last year of the validation sample. This would produce 17 one-step-ahead forecasts, 
16 two-step-ahead forecasts, up to one 17-step-ahead forecast. We use these forecasts 
to evaluate the out-of-sample forecast accuracy. For a range of forecast horizons, we 
calculate its forecast accuracy based on an error measure, such as MAFE or interval 
score given in (1), over different ages and years in the validation sample. The optimal 
Box-Cox transformation parameter is the one that minimizes the median of a forecast 
error measure over a range of forecast horizons. Computationally, the optimization can 
be achieved by using the optimize function in 

In Table 1, we present the selected Box-Cox transformation parameter A, based on the 
in-sample MAFE and interval score. For the purpose of comparison, we also consider the 
log transformation which is commonly used in modeling age-specific mortality. Based on 
the averaged MAFE and averaged interval score across 17 horizons, we found that with the 
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selected Box-Cox transformation parameter, the out-of-sample point and interval forecast 
errors can be reduced in comparison with the log transformation for each forecast horizon. 


h 

Point forecast accuracy 
MAFE a=0 .46 MAFE a=0 MAFE a=0 .4 

Interval forecast 

score A=0 .46 

accuracy 

score A=0 

1 

0.00117 

0.00235 

0.00120 

0.00543 

0.00682 

2 

0.00152 

0.00304 

0.00155 

0.00732 

0.00982 

3 

0.00219 

0.00388 

0.00225 

0.00936 

0.01332 

4 

0.00285 

0.00487 

0.00289 

0.01174 

0.01696 

5 

0.00352 

0.00590 

0.00360 

0.01412 

0.01905 

6 

0.00414 

0.00721 

0.00432 

0.01651 

0.02135 

7 

0.00487 

0.00853 

0.00508 

0.01942 

0.02568 

8 

0.00564 

0.00964 

0.00591 

0.02134 

0.02909 

9 

0.00635 

0.01067 

0.00662 

0.02400 

0.03324 

10 

0.00697 

0.01180 

0.00735 

0.02610 

0.03743 

11 

0.00758 

0.01291 

0.00780 

0.02865 

0.04313 

12 

0.00819 

0.01400 

0.00844 

0.03097 

0.04745 

13 

0.00894 

0.01499 

0.00938 

0.03314 

0.05058 

14 

0.00962 

0.01668 

0.01013 

0.03584 

0.06074 

15 

0.01032 

0.01782 

0.01070 

0.03912 

0.06641 

16 

0.01068 

0.01871 

0.01111 

0.04118 

0.07244 

17 

0.00992 

0.01830 

0.01179 

0.04337 

0.07312 

Mean 

0.00614 

0.01067 

0.00648 

0.02398 

0.03686 

Median 

0.00635 

0.01067 

0.00662 

0.02400 

0.03324 


Table 1: The estimated optimal Box-Cox transformation parameters and out-of-sample 
point and interval forecast accuracy for different horizons. The minimum MAFE and 
interval score are highlighted in bold for each horizon and the summary statistics. 


Note that Table 1 is consistent with the results of Hyndman and Booth (2008) who 
found that the best point forecast accuracy (for one-step-ahead forecasts) had A = 0.4. 
In comparison to A = 0.4, we found that our selected A = 0.46 gives better accuracy for 
each horizon, but their differences in point forecast accuracy on the testing sample are 
marginal. 


4.2 Application to age-specific fertility 

Prior to fitting a modified Lee-Carter model, the raw data are transformed by the Box- 
Cox transformation. The Box-Cox transformation may introduce a small bias, but can 
potentially reduce variance. As a result, this may improve estimation and forecast ac- 
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curacy. In terms of its effect on forecasts of fertility, Figure 4 displays the functional 
principle component decomposition for the Box-Cox transformed data with A = 0.46. 
From the bottom left plot, it is evident that the forecast fertility rates have a similar 
shape as the ones using the raw data. However, the age patterns are very different in 
shape from the un-transformed ones, such as the bimodality shown in the first principal 
component. From the forecast first principal component scores, such bimodality is likely 
to continue with increasing forecast uncertainties as horizon increases. The second prin¬ 
cipal component shows the contrast between ages around 25 and 40. From the forecast 
second principal component scores, such contrast is likely to decrease with increasing 
forecast uncertainties as horizon increases. 
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Figure 4: Principal component decomposition for the transformed Australian fertility 
data from 1921 to 1989, from which the forecasts are obtained from 1990 to 2006. 
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5 Conclusion and future research 


We presented a method and an algorithm for selecting the optimal Box-Cox transforma¬ 
tion parameter. The contributions of this paper are two-fold: First, we found that the log 
transformation may not be adequate for modeling and forecasting age-specific fertility. 
Second, we presented a way of selecting optimal Box-Cox transformation parameter based 
on in-sample forecast accuracy and showed that with the selected Box-Cox transforma¬ 
tion parameter, the out-of-sample point and interval forecast accuracy can be improved. 
In addition, our optimal Box-Cox transformation parameter A = 0.46 produces slightly 
smaller point forecast error in comparison to A = 0.4 used in Hyndman and Booth (2008). 

The proposed method and algorithm can be extended to select the optimal Box-Cox 
transformation parameter for modeling and forecasting age-specific migration. With the 
selected Box-Cox transformation parameter, the forecast uncertainties associated with 
age-specific components of population change are more likely to be fully captured. Finally, 
from a Bayesian viewpoint, it is also possible to embed the selection of the optimal Box- 
Cox transformation parameter into the modeling and forecasting. 
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